Agenda

Part I: Change in population, GDP per Capita and life expectancy over time

Overview of the dataset

  • Gapminder contains population, GDP per Capita and life expectancy for 142 countries in 5 continents from 1952 to 2007 in increments of 5 years.

How does world population change over time?

  • From 1952 to 2007, world population has grown from 2.4 billion to 6.3 billion. The growth rate is 162.5%.

How did the percent of world total population in each continent change from 1952 to 2007?

  • The percent of world total population in each continent has also changed. From 1952 to 2007, Africa’s percent rose from 10% to 14%. Europe decreased from 17% to 9%. Asia is still the most populous contient.

What are the 10 countries with the largest population? How does it change over time?

What are the 10 countries with the fastest growth rate of population from 1952 to 2007?

  • 10 countries with the fastest population growth rate are all from Africa and Asia.

How does the median of GDP per Capita in each continent change over time?

What are the 10 countries with the highest GDP per Capita? How does it change over time?

What are the 10 countries with the fastest growth rate of GDP per Capita?

What are the 10 countries with the highest GDP? How does it change over time?

  • In the past 50 years, the fastest growing economies in the world are mainly oil producers (Equatorial Guinea, Oman) and emerging economies in East and Southeast Asia relying on manufacturing (Korea, Taiwan, Thailand, Singapore)

How does life expectancy change over time?

How does the median of life expectancy in each continent change over time?

What are the 10 countries with the longest life expectancies in 2007?

  • Countries with high life expectancies are developed countries in North America, Asia, Europe and Oceania.

What are the 10 countries with the greatest improvement of life expectancy?

  • Most countries with great improvement in life expectancy are from Asia and Africa. These countries all experienced rapid ecnomic development after World War II.

Part II: Social-Ecnomic History Hidden in the Data

What happend in Europe between 1987 and 1992? Some European countries experienced a dramatic drop in GDP per Capita between 1987 and 1992. Why?

Identify Socialist States

  • When the ‘Eastern Bloc’ dissolved around 1992, socialist states struggled to adapt to free-market systems.

  • Germany is the only former socialist country who did not experience this economic recession.

  • Most socialist states experienced negative growth from 1987 to 1992.

There was a dramatic drop in life expectancy in some Africa countries in the 1990s. Why?

AIDS?

  • AIDS epidemic is taking a devastating toll on the population of many sub-Saharan countries. In the nine countries with an adult HIV prevalence of 10 per cent or more (Botswana, Kenya, Malawi, Mozambique, Namibia, Rwanda, South Africa, Zambia and Zimbabwe), the impact of AIDS is even more dramatic: more than 10 years of life expectancy have already been lost to AIDS.

Identify Africa countries with an adult HIV prevalence of 10 percent or more

Part III: Benford’s Law and the Leading Digits

Benford’s Law

  • The law states that in many naturally occurring collections of numbers, the leading significant digit is likely to be small. For example, in sets that obey the law, the number 1 appears as the leading significant digit about 30% of the time, while 9 appears as the leading significant digit less than 5% of the time.

  • A set of decimal numbers is said to satisfy Benford’s law if the leading digit \(d\ (d \in \{1, ..., 9\})\) occurs with probability \(P(d) = log_{10}(\frac{d+1}{d}) = log_{10}(1+\frac{1}{d})\). The leading digits in such a set thus have the following distribution:

Chi-square Goodness-of-fit Test

  • Test if a sample of data came from a population with a specific distribution. The chi-square goodness-of-fit test can be applied to discrete distributions. In other words, it is used to determine whether there is a significant difference between the expected frequencies and the observed frequencies in one or more categories.

  • \(H_0\): The leading digits of population in 2007 follows Benford’s law.
  • \(H_a\): The leading digits of population in 2007 does not follow Benford’s law.

  • Chi-square Statistic:

    • \(Q_{k-1} = \sum_{i=1}^{k}\frac{(O_i-np_i)^2}{np_i}\)

    • \(k\): number of digits (=9)

    • \(k-1\): degree of freedom (=8)

    • \(O_i\): observed count of digit \(i\)

    • \(n\): total number of observations

    • \(p_i\): probability of digit \(i\) predicted by Benford’s law.

  • Let’s estimate the p-value by Monte Carlo.

P-value

The estimated p-value is 0.77.

The p-value given by the built-in function pchisq is 0.76, which is very similar to what we got from our Monte Carlo sampling distribution.

Critical Value and P-value for 2007 GDP per Capita

The critical value is 11.28.

The estimated p-value is 0.19.

The p-value given by the built-in function pchisq is 0.19, which is very similar to what we got from Monte Carlo.

The Whole Dataset

Compare empirical distribution and theoretical distribution

Sampling distribution:

Critical value and p-value, population

  • The critical value is 16.61. The estimated p-value is 0.04.

  • The p-value given by the built-in function pchisq is 0.03, which is very similar to what we got from Monte Carlo.

Critical value and p-value, GDP per Capita

  • The critical value is 33.81. The estimated p-value is 0.

  • The p-value given by the built-in function pchisq is 0, which is very similar to what we got from Monte Carlo.

  • In practice, applications of Benford’s Law for fraud detection routinely use more than the first digit.
  • Package: benford.analysis
  • One can select the number of digits to do the test.